ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group03a.txt / 000021_icon-group-sender_Thu Feb 27 12:24:36 2003.msg < prev next >

Wrap

Internet Message Format | 2003-12-22 | 2KB

Return-Path: <icon-group-sender> Received: (from root@localhost) by baskerville.CS.Arizona.EDU (8.11.1/8.11.1) id h1RJNMc10038 for icon-group-addresses; Thu, 27 Feb 2003 12:23:22 -0700 (MST) Message-Id: <200302271923.h1RJNMc10038@baskerville.CS.Arizona.EDU> Subject: Help with high level guidance on text searching algorithms To: icon-group@cs.arizona.edu From: "David Gamey" <dgamey@ca.ibm.com> Date: Thu, 27 Feb 2003 10:34:33 -0500 X-MIMETrack: Serialize by Router on D01ML391/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 02/27/2003 10:33:54 AM Errors-To: icon-group-errors@cs.arizona.edu Status: RO Hi all, I've been checking some links for different algorithms and how they apply to different problems. They field has really exploded since the last time I looked in detail. A quick poke about in the IPL didn't turn up anything. I did find lots of detailed links (off the agrep site) but I really want to see the forest (not the trees, branches and roots) right now. Perhaps someone could give me a pointer or two. The problem I'm looking at is related to searching through sets of text looking for commonality (substrings not patterns - although that would be of secondary interest). I'm not looking for optimal or minimal sets of substrings but relatively good matches. Given a set of messages, I'd like to be able to categorize them into sets that share common characteristics - probable this, probable that. This has to be similar to things like search engines and spam filters do. Thanks, in advance. David